Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus

نویسندگان

  • Issa Atoum
  • Ahmed Otoom
چکیده

Text similarity plays an important role in natural language processing tasks such as answering questions and summarizing text. At present, state-of-the-art text similarity algorithms rely on inefficient word pairings and/or knowledge derived from large corpora such as Wikipedia. This article evaluates previous word similarity measures on benchmark datasets and then uses a hybrid word similarity in a novel text similarity measure (TSM). The proposed TSM is based on information content and WordNet semantic relations. TSM includes exact word match, the length of both sentences in a pair, and the maximum similarity between one word and the compared text. Compared with other well-known measures, results of TSM are surpassing or comparable with the best algorithms in the literature. Keywords—text similarity; distributional similarity; information content; knowledge-based similarity; corpus-based similarity; WordNet

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

A Study of Hybrid Similarity Measures for Semantic Relation Extraction

This paper describes several novel hybrid semantic similarity measures. We study various combinations of 16 baseline measures based on WordNet, Web as a corpus, corpora, dictionaries, and encyclopedia. The hybrid measures rely on 8 combination methods and 3 measure selection techniques and are evaluated on (a) the task of predicting semantic similarity scores and (b) the task of predicting sema...

متن کامل

Semantic Searching and Ranking of Documents using Hybrid Learning System and WordNet

Semantic searching seeks to improve search accuracy of the search engine by understanding searcher’s intent and the contextual meaning of the terms present in the query to retrieve more relevant results. To find out the semantic similarity between the query terms, WordNet is used as the underlying reference database. Various approaches of Learning to Rank are compared. A new hybrid learning sys...

متن کامل

Using Semantic Similarity To Acquire Cooccurrence Restrictions From Corpora

We describe a method for acquiring semantic cooccurrence restrictions for tuples of syntactically related words (e.g. verb-object pairs) from text corpora automatically. This method uses the notion of semantic similarity to assign a sense from a dictionary database (e.g. WordNet) to ambiguous words occurring in a syntactic dependency. Semantic similarity is also used to merge disambiguated word...

متن کامل

Measuring Semantic Distance using Distributional Profiles of Concepts

Automatic measures of semantic distance can be classified into two kinds: (1) those, such as WordNet, that rely on the structure of manually created lexical resources and (2) those that rely only on co-occurrence statistics from large corpora. Each kind has inherent strengths and limitations. Here we present a hybrid approach that combines corpus statistics with the structure of a Roget-like th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016